Skip to content

DAOS-18610 tests: Fix container/boundary timeouts#18608

Draft
liw wants to merge 1 commit into
masterfrom
liw/boundary-max_workers
Draft

DAOS-18610 tests: Fix container/boundary timeouts#18608
liw wants to merge 1 commit into
masterfrom
liw/boundary-max_workers

Conversation

@liw

@liw liw commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

The test employs 30000 threads via
concurrent.futures.ThreadPoolExecutor. Each thread creates a container and sleeps 2 s. In el8’s Python 3.6, this ThreadPoolExecutor allows (os.cpu_count() or 1) * 5 threads to run at the same time, whereas in el9’s Python 3.9, it only allows min(32, (os.cpu_count() or 1) + 4). Log messages produced by my debug PR confirm the latter to be 32, which must be way smaller than the former. If we consider only the time taken by the sleep in every thread, then it takes at least 2 * (30000 / 32) = 1875 s to finish 30000 such threads, already longer than the test timeout of 1200 s.

This patch specifies max_workers explicitly to limit the delay caused by sleeping.

Test-tag: pr test_container_boundary

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Ticket title is 'container/boundary.py:BoundaryTest.test_container_boundary - test timeout creating containers'
Status is 'In Progress'
Labels: '2.6.5.p1rc1,2.6.5rc1,2.6.5rc2,2.6.5rc3,2.8.0rc1,ci_2.6_weekly,ci_master_weekly,weekly_test'
https://daosio.atlassian.net/browse/DAOS-18610

@liw liw force-pushed the liw/boundary-max_workers branch from bc35460 to 87ae38d Compare July 3, 2026 02:21
The test employs 30000 threads via
concurrent.futures.ThreadPoolExecutor. Each thread creates a container
and sleeps 2 s. In el8’s Python 3.6, this ThreadPoolExecutor allows
(os.cpu_count() or 1) * 5 threads to run at the same time, whereas in
el9’s Python 3.9, it only allows min(32, (os.cpu_count() or 1) + 4). Log
messages produced by my debug PR confirm the latter to be 32, which must
be way smaller than the former. If we consider only the time taken by
the sleep in every thread, then it takes at least 2 * (30000 / 32) =
1875 s to finish 30000 such threads, already longer than the test
timeout of 1200 s.

This patch specifies max_workers explicitly to limit the delay caused by
sleeping.

Test-tag: pr test_container_boundary
Signed-off-by: Li Wei <liwei@hpe.com>
@liw liw force-pushed the liw/boundary-max_workers branch from 87ae38d to 1a14a63 Compare July 3, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant